In this lab you will learn
Infectious disease dynamics is analogous to predator-prey dynamics, except the timescales of predator and prey are very different.
We can model infectious diseases using the SIR model, which divides the host population into susceptible, infected, and recovered classes.
The fate of the disease – whether it grows to epidemic levels or dies out quickly – is governed by its basic reproductive number, \(R_0\).
Herd immunity occurs when the susceptible class is too small for the disease to continue to spread.
Vaccines precipitate herd immunity by removing susceptibles from the host population.
We can use the SIR model to forecast the cumulative infection levels of COVID-19 in the United States.
Some infectious diseases are transmitted indirectly via vectors. This is the case of malaria, one the world’s mostly deadly diseases.
Modeling malaria helps inform management and mitigation plans.
Parasites can manipulate their hosts into behavior that benefits them at the host’s expense.
Parasitism is so widespread in nature that even parasites have their own parasites.
Note: Before starting today’s lab, please install
package lamW using the command
install.packages('lamW') AND package Rcpp
using install.packages('Rcpp')
library(tidyverse)
library(gridExtra) ## for plotting function grid.arrange()
library(deSolve) ## for predator-prey-resource ODE model
library(magrittr) ## for pipe symbol %<>%
library(knitr) ## for table-viewing function kable()
library(lamW) ## for the Lambert function lambertW0
theme_set(theme_bw())
theme_update(
panel.grid = element_blank(),
aspect.ratio = 1
)
Figure: The scourge of 2020.
The basic SIR model above is of course far too simple to capture the complexities of a real epidemic. With that important caveat in mind, let us apply what we have learned from the SIR model in the context of the COVID-19 pandemic. (Also, check out this paper, published in November 2020, which accurately predicts infections in 10 major American cities using an only slightly more complicated model calibrated with data on the hourly movements of people gathered from cell phones!)
First, we must load the data, which I pulled from the CDC website for the dates between January 22, 2020 and March 1, 2021. You can download the data from my GitHub using the R code
github_address =
'https://raw.githubusercontent.com/rafaeldandrea/BIO-356/master/United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv'
covid_data =
read_csv(github_address, col_types = cols()) %>%
as_tibble %>%
mutate(submission_date = as.Date(submission_date, tryFormats = '%m/%d/%Y')) %>%
arrange(state, submission_date)
We will also need population data. You can get it directly from the US Census Bureau:
census_address = 'https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/national/totals/nst-est2019-alldata.csv'
census_data =
read.csv(census_address) %>%
as_tibble
Those datasets aggregate data from all states. For the sake of example, let us focus on New York State. Thus, we must filter the data:
state_abb = 'NY'
state_name = state.name[state.abb == state_abb]
state_pop =
census_data %>%
filter(NAME == state_name) %>%
pull(POPESTIMATE2019)
covid_state =
covid_data %>%
filter(state == state_abb)
Here is what the NYS covid dataset looks like
covid_state %>%
select(submission_date, state, tot_cases, new_case, tot_death, new_death) %>%
filter(submission_date > as.Date('2020-04-12'))
## # A tibble: 322 × 6
## submission_date state tot_cases new_case tot_death new_death
## <date> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2020-04-13 NY 88268 2782 2661 218
## 2 2020-04-14 NY 91743 3475 2882 221
## 3 2020-04-15 NY 95477 3734 3081 199
## 4 2020-04-16 NY 99138 3661 3247 166
## 5 2020-04-17 NY 102290 3152 3560 313
## 6 2020-04-18 NY 105548 3258 3562 2
## 7 2020-04-19 NY 108350 2802 3787 225
## 8 2020-04-20 NY 110706 2356 3940 153
## 9 2020-04-21 NY 112365 1659 4107 167
## 10 2020-04-22 NY 114784 2419 4260 153
## # ℹ 312 more rows
Let us now plot the daily number of new deaths for NY:
covid_state %>%
ggplot(aes(submission_date, new_death)) +
geom_point() +
ggtitle(state_name) +
labs(
x = 'Date',
y = 'New deaths'
)
Notice the presence of several outliers around April 2020. Some are obvious mistakes: it is of course impossible to have negative deaths, and very implausible that a single day would have almost ten times as many deaths as the days immediately before and after it. We can infer that these outliers are mistakes in the records.
One way to deal with suspicious data points, especially if they are obvious measurement errors, is to remove them. We will do so using the Hampel filter, which removes data points that depart from the median of the surrounding data points by more than 3 standard deviations. The underlying principle is that if a data point deviates from the bulk of the distribution by an inordinate amount, it is probably an error in measurement or record keeping.
first_day = min(covid_state$submission_date) # 20
last_day = max(covid_state$submission_date) # 250
covid_state %<>%
mutate(
time_bracket = findInterval(submission_date, vec = seq(first_day, last_day, by = 7))
) %>%
group_by(time_bracket) %>%
slice(
which(
abs(new_death - median(new_death)) <= 3 * mad(new_death)
)
) %>%
ungroup
Note: Some data points that deviate considerably from the rest may reflect truly unusual days. Since our analyses of these data is not meant for publication or policymaking, we won’t worry too much about it. But you should know that removing data outliers from statistical analyses is controversial. The safest thing to do is make sure your analysis is robust to outliers – ie, its results are not drastically impacted by them.
After removing the outliers, the data looks thus:
plot_deaths =
covid_state %>%
ggplot(aes(submission_date, new_death)) +
geom_point() +
ggtitle(state_name) +
labs(
x = 'Date',
y = 'New deaths'
)
plot_deaths
We can add a smooth curve to these data to aid the eye in identifying the general characteristics of the disease epidemiology. For that we’ll use loess regression. (LOESS stands for locally estimated scatterplot smoothing. It is similar in spirit to linear regression, except it fits a higher-degree polynomial to local subsets of the data. If you are interested in the gory details, you can find them on Wikipedia.)
plot_deaths +
geom_smooth(method = 'loess', formula = y ~ x, span = .1, color = 'red', se = FALSE)
Comparing this curve to the \(I\) (green) curve in the SIR model above, we see similarities and differences. Like the \(I\) curves, we see that infections start low, then rise, then peak, and finally fall back down. However, we see not one but two peaks! The SIR model cannot explain that. What happened?
The main discrepancy is that the SIR model assumes constant rates of infection (\(\beta\)), disease-caused mortality (\(\mu\)), and recovery (\(\gamma\)), whereas in the case of COVID-19 those rates are very much not constant.
For example, we can tell that mortality was not constant by comparing the curves of new cases and new deaths:
plot_cases =
covid_state %>%
ggplot(aes(submission_date, new_case)) +
geom_point() +
ggtitle(state_name) +
geom_smooth(method = 'loess', formula = y ~ x, span = .1, color = 'darkgreen', se = FALSE) +
labs(
x = 'Date',
y = 'New cases'
)
gridExtra::grid.arrange(
plot_cases +
ggtitle('NY - cases'),
plot_deaths +
geom_smooth(method = 'loess', formula = y ~ x, span = .1, color = 'red', se = FALSE) +
ggtitle('NY - deaths'),
nrow = 1
)
Q14. Based on the plots above, do you conclude that covid-related mortality increased or decreased between the spring of 2020 and winter of 2020-2021? Explain your answer.
Q15. What are some reasons for this change in mortality between the first and second “waves” of COVID-19 in NYS? (Make sure to check the timeline of Covid 19 in 2020-2021, which was largely prior to vaccines. You might want to look up news articles of the time, Wikipedia, and cite your sources)
We may intuitively guess that NYS was hit with two “waves” of disease, one in the Spring of 2020 and one in the 2020-2021 Winter. The second wave of infection was worse than the first, as we can see from the new cases curve above.
Q16. What does that say about \(\beta\) in the first and second waves, and how could we explain that change? (Make sure to check the timeline of Covid 19 in 2020-2021, which was largely prior to vaccines. You might want to look up news articles of the time, Wikipedia, and cite your sources)
Q17. a) Plot the total number of cases in NYS using the code below. What is this curve showing? b) Why is it strictly non-decreasing? c) How do you interpret the two parts of the curve that show a steep rise, and how do you interpret the plateau in between those two legs? d) Based on the plot, how many people do you estimate had contracted Covid by the end of the first wave?
plot_total_cases =
covid_state %>%
ggplot(aes(submission_date, tot_cases / 1e3)) +
theme(panel.grid.major.y = element_line(color = 'grey80')) +
geom_line() +
labs(
x = 'Date',
y = 'Total number of cases in NYS (thousands)'
) +
scale_y_continuous(breaks = seq(0, 1000, by = 50))
Despite this bimodal behavior, which is incompatible with our SIR model, we can still draw some insight by focusing on the first wave (between late March and early August 2020), when the disease was new and every member of the population was initially susceptible. We will make the tacit assumption that during that period, the rates were relatively constant.
plot_deaths +
geom_smooth(method = 'loess', formula = y ~ x, span = .1, color = 'red', se = FALSE) +
coord_cartesian(xlim=as.Date(c('2020-01-22', '2020-08-15'))) +
scale_x_continuous(
breaks = as.Date(c(paste0('2020-0', 2:8 ,'-01'))),
labels = paste0(month.abb[2:8], ' 1')
)
Now this curve resembles the green \(I\) curves of infected individuals in the population. Notice it is not the same curve, however, as we are plotting above the number of new daily deaths, not the number or proportion of people currently infected with the disease. Nevertheless, according to our model, these quantities are proportional to each other (i.e. number of people who die of COVID-19 per unit time are proportional to the number of people infected with it). As such, we can use the day on which the number of daily deaths peaks as an indicator of the day when the \(S\) class reached the critical value \(\frac{1}{R_0}\), and then estimate \(R_0\) by solving the expression \(S_c = 1 - \frac{J}{N} = \frac{1}{R_0}\) for \(R_0\), where \(J\) is the total number of recorded cases by the critical day, and \(N\) is the population of the state of New York.
| peak date | total cases by peak date (thousands) | state population (millions) | Sc | R0 |
|---|---|---|---|---|
| 2020-04-13 | 88.3 | 19.5 | 0.9954626 | 1.004558 |
As the table shows, The basic reproductive number of the disease in New York is estimated at \(R_0 \simeq 1.0046\), meaning the average sick individual infects an additional 1.0046 people before recovering or dying.
We can use this information to predict the total number of cases by the end of the first wave of the epidemic, which upon visual inspection of the plot above we can place sometime between July and August 2020. According to the data, by July 1st 2020 there had been a total of 178 thousand cases in New York State, and by August 1st that number is 190 thousand.
The following box provides the mathematical justification for what we are going to do. You can skip it if you are not interested in the extra-credit question at the end of it.
How to estimate the total number of people expected to get Covid using the SIR model
Recall our SIR model,
\[ \begin{eqnarray} \frac{dS}{dt} &=& -\beta SI \\ \frac{dI}{dt} &=& \beta SI - \gamma I - \mu I \\ \frac{dR}{dt} &=& \gamma I \end{eqnarray} \]
By algebraically manipulating the \(S\) and \(R\) equations, we notice a correspondence between the dynamics of susceptible and recovered individuals:
\[ -\frac{1}{\beta}\frac{1}{S}\frac{dS}{dt} = I = \frac{1}{\gamma}\frac{dR}{dt} \]
Recalling the derivative of the logarithm of a function, \(\frac{d\log y}{dx} = \frac{1}{y}\) and the chain rule, and further manipulating constants, we can write:
\[ \frac{d}{dt}\log S = \frac{d}{dt}\left(-\frac{\beta}{\gamma}R\right) \]
Integrating gives
\[ \log S = -\frac{\beta}{\gamma}R + c \]
Where \(c\) is an integration constant. We can find its value by considering the initial conditions: when the disease first invades at time \(t=0\), no one has been infected yet, so 100% of the population is susceptible (\(S(0)=1\)) and no one has recovered yet (\(R(0)=0\)). Plugging these values in the expression above, we get \(\log 1 = -\frac{\beta}{\gamma}0 + c\), from which we conclude \(c=0\).
We can get rid of the logarithm by exponentiating on both sides:
\[ S = \exp\left(-\frac{\beta}{\gamma}R\right) \]
This gives us the proportion of susceptible individuals as a function of the proportion of recovered individuals at any given time. That’s a good start, but we are interested in estimating the total number of individuals who have ever contracted the disease by the time the epidemic is over. To find out, we must calculate \(S_\infty\), i.e., the total proportion of the population who are still susceptible in the limit \(t \rightarrow \infty\).
To make progress here, we must make an approximation, which in the case of COVID-19 is well justified: even though covid is a serious disease, most people who catch it recover. The World Health Organization estimates that the infection fatality rate (i.e. the proportion of people infected who eventually die) in the US is somewhere between 0.5% and 1%. Since the number of people who die of COVID-19 is much smaller than the number who recover, we can approximate \(R_\infty = 1 - S_\infty\). In words, everyone who ever got sick recovered. Also under this approximation, we can rewrite the ratio \(\frac{\beta}{\gamma}\) in terms of the basic reproductive number of the disease, \(R_0\). Recall that in general, \(R_0=\frac{\beta}{\gamma+\mu}\). If mortality \(\mu\) is much smaller than recovery rate \(\gamma\), we can approximate \(\frac{\beta}{\gamma} = R_0\).
So we have
\[ S_\infty = \exp\left(-\frac{\beta}{\gamma}R_\infty\right)=\exp\left(-R_0(1-S_\infty)\right) \]
This is a transcendental equation, meaning it relates our variable of interest (\(S_\infty\)) to itself via a non-algebraic function. There isn’t a systematic way to solve these types of equation, but lucky for us, this one has already been solved (as you can check for yourself by typing it on Wolfram alpha).
The solution is
\[ S_\infty = -\frac{1}{R_0}W(-R_0 e^{-R_0}) \] where \(W()\) is the main branch of the Lambert W function, also called the product logarithm function.
We are finally ready to estimate the total number of people who will ever catch the disease by the time the epidemic is over.
Bonus Q: Use the results derived in this box to write the total number of people who will ever catch the disease by the time the epidemic is over. Write your answer in terms of the disease’s basic reproductive number \(R_0\) and the population size \(N\).(We are looking for a mathematical expression, not a number.)
The R code below defines a function that receives as arguments the population of the state and the estimated \(R_0\), which we obtained above, and outputs the SIR-model prediction (in thousands of people) for the number of individuals expected to have contracted COVID-19 by the end of the first wave in New York State.
final_infection =
function(N, R0){
S_init = 1 - 1 / N
S_final = - 1 / R0 * lambertW0(-S_init * R0 * exp(-R0))
return(N /1e3 * (1 - S_final))
}
Does the SIR model do a good job of predicting the total number of infections in the first wave? You will find that out in Q18.
Q18. a) Use the final_infection function above
to find the SIR model prediction for the total number of cases by the
end of the first infection wave, based on the estimated \(R_0\) and the population of NYS. b) Edit
the code below to add a red horizontal line to the plot of total cases
over time. That line will mark the prediction you calculated in (a). Did
the prediction match the data?
## Basic reproduction number of COVID-19 (estimated above)
R0 = ???
## Population of NYS
N = 19453561
## You can find this number by running the function final_infection above, using as arguments the population of NYS and the estimated R0.
predicted_total_infection = ???
## Plotting your results
plot_total_cases +
geom_hline(aes(yintercept = predicted_total_infection), color = rgb(153 / 255, 0, 0))
The plot below shows analogous results for all states. Since some states are much more populous than others, to facilitate the comparison we are plotting new cases per thousand residents.
One thing that becomes immediately clear is that COVID-19 did not have the same trajectory across the country. Some states had a single peak of infection, while other states had more than two. And among those which had two peaks, like NY, the positions of the peaks on the x-axis are not always the same.
Q19. Being by far the most populous state, California had the most cases overall. On a per capita basis, however, other states were hit harder with COVID-19. At its peak of infection, which state had the most cases per thousand residents?
Q20. Most states in the Northeast and the South and Southwest had two clear peaks of infection. What is the main difference between those states and NY, in terms of the dates when the peaks occurred? Recalling that New York City was the nation’s first COVID-19 hotspot, what is a simple explanation for this difference?
Again, it bears emphasizing that the SIR model is too simple to accurately describe the complexities of the COVID-19 pandemic in the United States. That said, the SIR model is a foundation for more realistic models which scientists use to predict outcomes and inform public policy.
For up-to-date estimates of the effective reproductive number \(R_e\) for all states, obtained using much more sophisticated epidemiological modeling, check out this website by the Centre for Mathematical Modeling of Infectious Diseases.
Before moving on to the next topic, take a moment to contemplate the enormous benefit to the public that disease models like the SIR and its more sophisticated cousins can provide if they can accurately describe disease dynamics and help inform science-based pandemic responses. Indeed, this paper does just that, quantifying the impact of mass gatherings on infection rates, and even predicting the disproportional impact on lower-income and non-white communities in American cities without having been fed such demographic data. In the authors’ words, “By capturing who is infected at which locations, our model supports detailed analyses that can inform more effective and equitable policy responses to COVID-19.”
Figure: Malaria parasite attaching to a human red blood cell, by NIAID. Anopheles stephensi, one of the mosquito species which transmit the disease to humans, by Jim Gathany. Two researchers who received the Nobel Prize for Physiology or Medicine for their work on malaria: Ronald Ross (1902), Tu Youyou (2015, photo by A. Mahmoud).
Malaria is one of the biggest killers of all infectious diseases worldwide, second only to tuberculosis and HIV/AIDS. Over 90% of the cases and deaths are in Africa. The first time malaria received a mathematical treatment was in 1911 by Ronald Ross, a British doctor. His model, like many we have seen in this course, oversimplifies reality, but provided the first insights into the most effective way to combat the disease, and ultimately served as scaffolding for more realistic models in use today.
The model divides both the human and the mosquito populations into infected and uninfected individuals. The model describes how the proportion of infected humans \(I_h\) and infected mosquitoes \(I_v\) changes over time. The basic idea is that humans acquire the disease in proportion to both the number of infected mosquitoes and the number of uninfected humans, and return to the uninfected class through recovery. Meanwhile, mosquitoes acquire the disease in proportion to the number of infected humans (since mosquitoes acquire the disease by biting infected humans) and the proportion of uninfected mosquitoes, and also return to the uninfected class through recovery. This model ignores, among other things, host and vector mortality.
In mathematical terms, the model reads
\[ \frac{dI_h}{dt} = \alpha I_v(1-I_h)-\gamma I_h\\ \frac{dI_v}{dt} = \beta I_h(1-I_v)-\delta I_v \]
The model has four parameters, \(\alpha, \beta, \gamma, \delta\), all of which are positive numbers. \(\gamma\) and \(\delta\) are immediately recognizable as the recovery rates of humans and mosquitoes, respectively. We will look into the meaning of \(\alpha\) and \(\beta\) momentarily.
How many equilibrium solutions does this model have?
To find out, we look for values of \(I_h\) and \(I_v\) in terms of the parameters \(\alpha, \beta, \gamma, \delta\) such that \(\frac{dI_h}{dt}=0\) and \(\frac{dI_v}{dt}=0\). A bit of algebra (or just plugging this in WolframAlpha) reveals that there are only two possibilities: either both \(I_h\) and \(I_v\) are equal to zero, in which case there is no disease, or
\[ I_h = \frac{\alpha\beta - \gamma\delta}{\alpha\beta + \beta\gamma} \hspace{2cm} \textrm{and} \hspace{2cm} I_v = \frac{\alpha\beta - \gamma\delta}{\alpha\beta + \alpha\delta} \]
In this latter case, if certain conditions are met about the parameters (see Q21), then a positive percentage of humans and mosquitoes are always infected, which means malaria is endemic to the host and vector populations.
The malaria-free equilibrium (\(I_h = 0, I_v = 0\)) is always an equilibrium: as long as no human and no mosquito carries malaria, the disease can never establish. By contrast, the endemic malaria equilibrium may not exist, depending on the values of the parameters.
When the endemic equilibrium does not exist, the malaria-free equilibrium is the only equilibrium, and any initial incidence of malaria will eventually go away. However, when the endemic equilibrium exists, it is stable, meaning that any initial proportion of malaria in the population will eventually end in this equilibrium.
Obviously, we wish to understand what combinations of parameter values leads to the desirable situation where the endemic malaria equilibrium does not exist.
Q21. Upon examination of the expressions above for the endemic malaria equilibrium, what needs to be true of the parameters for this equilibrium to exist?
Your answer to Q21 reveals an important ratio, which must be greater than 1 for malaria to establish in the population. If that sounds familiar, it is because this ratio is the basic reproductive number of malaria!
Because it is such an important quantity, we want to understand how we can keep it below 1. To do that, we must understand what goes into it.
What biological rates make up the parameters \(\alpha\) and \(\beta\)? \(\alpha\) is the rate at which an infected mosquito transmits the disease to an uninfected human. It will therefore contain the product of the mosquito biting rate \(b\) – i.e. how often a mosquito seeks out a blood meal – and the probability \(p\) that a bite by an infected mosquito will transmit the pathogen. Also, mosquitoes have a fixed number of blood meals per unit of time regardless of the number of people around. Because of this fact, \(\alpha\) works out to also being proportional to the number of mosquitoes per human, \(m\) (i.e. the ratio of the mosquito population to the human population). \(\beta\) is also proportional to the mosquito biting rate, and to the probability \(q\) that biting an infected human will transmit the pathogen to the mosquito. Putting it all together, we have \(\alpha = bpm, \; \beta = bq\).
Combining the above with your results for Q21, we get that the basic reproductive number of malaria according to the Ross-Macdonald model is
\[ R_0 = \frac{b^2pqm}{\gamma\delta} \]
This combination of parameters makes intuitive sense: the reproductive number increases with the biting rate, probability of disease transmission to and from the mosquito, and number of mosquitoes per person, and decreases with the recovery rate of humans and mosquitoes. Interestingly, \(R_0\) grows quadratically on the biting rate. This means that if the mosquito doubles its biting rate, the \(R_0\) of malaria increases by a factor of four!
| Symbol | Meaning |
|---|---|
| b | Biting rate of the mosquito, i.e. how often a mosquito seeks a blood meal |
| p | Probability that a bite by an infected mosquito will successfully transmit malaria |
| q | Probability that a bite to an infected human will transmit the pathogen to the mosquito |
| m | Ratio of the mosquito population to the human population |
| \(\alpha\) | Rate at which an infected mosquito transmits malaria to an uninfected human |
| \(\beta\) | Rate at which an uninfected mosquito acquires the pathogen from an infected human |
| \(\gamma\) | Recovery rate of the mosquito |
| \(\delta\) | Recovery rate of humans |
We can use this result to inform mitigation strategies. For example, if an anti-malarial drug increases recovery rates of patients by 20%, the model would predict a reduction in the \(R_0\) to approximately 83% of its original value:
\[ R_e = \frac{b^2pqm}{\gamma \, (1.2\delta)} = \frac{1}{1.2}R_0 = 83\% R_0 \]
This means that if implemented at scale, the drug would reduce the disease’s basic reproductive number by 17% of its original value. By comparison, if another drug reduces the probability that a person bitten by an infected mosquito contracts malaria by 15%, then we would predict a reduction to 85% of the original \(R_0\) (it would reduce \(R_0\) by 15%).
\[ R_e = \frac{b^2p(0.85q)m}{\gamma\delta} = 0.85\,R_0 = 85\% R_0 \]
We could then conclude that Option 2 is only cost-beneficial compared to Option 1 if it is easier to implement at scale (for example by costing less) by a factor of at least 15/17 = 88%. In other words, Option 2 would need to cost at most 88% of Option 1 (ie. it would need to be at least 12% cheaper than Option 1) to be cost-beneficial.
Note: The statement above about cost-effectiveness is a simplification. It comes from approximating the impact of a mitigation strategy on the \(R_0\) of the disease via the effective reproductive number, calculated with the expression \(R_e=(1-\frac{e}{c})R_0\), where \(e\) is the per-capita relative effect of the measure (in the case of Options 1 and 2 above, 17% and 15% respectively) and \(c\) is the cost in dollars, assumed to be much higher than 1. This expression yields the formula \(c_A<\frac{e_A}{e_B}c_B\) if Option A is to be more cost-effective than Option B.
Q22. Suppose you had a fixed budget to fund a program to combat malaria. Your consultants offer you three options, each of which would consume your entire budget: a) purchasing and distributing insecticide-treated mosquito nets to drape over beds, which could reduce biting rates by 40%, b) releasing genetically modified sterile males which would decimate the mosquito population by 50%, or c) purchasing and distributing a drug that reduced by 60% the probability of contracting the disease upon being bitten by an infected mosquito. Considering these options’ respective predicted impact on \(R_0\) according to the Ross-Macdonald model, which should you choose? How much cheaper would each of the other two options need to be to be cost-beneficial?
The code below defines R functions that simulate the Ross-Macdonald model and plot the time-evolution of the disease, starting from an initial state where a certain percentage of people and mosquitoes are infected with malaria.
RM_Model =
function(
initial_IH,
initial_IV,
mosquitoes_per_person,
human_recovery_rate,
mosquito_recovery_rate,
biting_rate,
human_infection_probability,
mosquito_infection_probability,
final_time,
time_step
){
RossMacdonald =
function(t, state, parameters){
with(as.list(c(state, parameters)), {
dIHdt = alpha * IV * (1 - IH) - gamma * IH
dIVdt = beta * IH * (1 - IV) - delta * IV
list(c(dIHdt, dIVdt))
})
}
times = seq(0, final_time, by = time_step)
parameters =
c(
alpha = biting_rate * human_infection_probability * mosquitoes_per_person,
beta = biting_rate * mosquito_infection_probability,
gamma = human_recovery_rate,
delta = mosquito_recovery_rate
)
state =
c(
IH = initial_IH,
IV = initial_IV
)
out = ode(y = state, times = times, func = RossMacdonald, parms = parameters)
return(
list(
parameters =
list(
mosquitoes_per_person,
human_recovery_rate,
mosquito_recovery_rate,
biting_rate,
human_infection_probability,
mosquito_infection_probability,
final_time,
time_step
),
initial_conditions = list(initial_IH, initial_IV),
state = out
)
)
}
Plot_Endemism = function(model){
as.data.frame(model$state) %>%
ggplot(aes(time, 100 * IH)) +
geom_line(linewidth = 1, color = rgb(153/255, 0, 0)) +
labs(
x = 'time (months)',
y = 'percentage of population with malaria'
) +
theme(legend.position = 'none') +
coord_cartesian(ylim = c(0, 100)) +
ggtitle(paste0('endemism: ', round(100 * model$state[nrow(model$state),'IH']), '%'))
}
Consider a scenario where \(R_0\) is large enough to keep malaria endemic within a village, starting with 1% infection in both the host and vector populations:
Plot_Endemism(
model =
RM_Model(
initial_IH = .01,
initial_IV = .01,
mosquitoes_per_person = 20,
human_recovery_rate = .5,
mosquito_recovery_rate = .5,
biting_rate = .4,
human_infection_probability = .5,
mosquito_infection_probability = .5,
final_time = 200,
time_step = .1
)
)
Figure: Endemism of malaria, measured in percentage of the local human population infected with malaria at the steady state (i.e. after infection levels have stabilized).
Q23. Test your answers to Q22 by changing – in turn! – the appropriate parameter in the model corresponding to Options A, B, C. What happens to the degree of endemism in each scenario? Show your plots.
Q24. Even simplistic models have the power to refine our intuitions. Starting from the after-the-fact knowledge that the biting rate has an outsize impact on \(R_0\) and thinking backwards, explain why this makes (bio)logical sense. (Note: We are not looking for mathematical statements of fact about how the different parameters contribute linearly or quadratically to \(R_0\), but for the underlying reasons for this. Hint: It might help to think about why targeting the biting rate does two things at once.)
In class, we saw how some parasites and parasitoids manipulate the physiology or behavior of their hosts for their own benefit. Examples include arthropod-induced galls in plants and “zombie ants” infected with Ophiocordyceps fungi. Even the malaria parasite seems to manipulate its host. Infected mosquitoes bite less frequently than usual while in the early stage of infection when the parasite is not yet ready to infect people, but become particularly attracted to humans at late-stage infection. This facilitates spread of the Plasmodium (although this could actually be an adaptive response of the host to infection. Another example of host behavior that could be adaptive to the host but also the pathogen is sneezing).
Figure: Sneezing: A way to expel the pathogen, or a way for the pathogen to spread? Photo by Andrew Davidhazy/RIT
Q25. This 2016 review in Frontiers in Ecology and Evolution discusses many other examples of host manipulation by parasites in plants and animals, as well as open questions such as why there are relatively few reports of host manipulation by sexually-transmitted pathogens. Describe one example of host manipulation mentioned in this paper.
Figure: bacteriophages attacking and infecting a bacterium. From Shutterstock.
Q26. Describe hyperparasitism in your own words and explain one potential application of this phenomenon in agriculture, conservation biology, or medicine. Look up the research literature on Google Scholar and provide an example of one such application. Cite the paper you found and include its doi (digital object identifier).
\[ \color{red}{***\; \textrm{This is the end of Part 2 of Lab 7} \;***} \]